A Semi-Automatic, Iterative Method for Creating a Domain-Specific Treebank

نویسندگان

  • Corina Dima
  • Erhard W. Hinrichs
چکیده

In this paper we present the development process of NLP-QT, a question treebank that will be used for data-driven parsing in the context of a domain-specific QA system for querying NLP resource metadata. We motivate the need to build NLP-QT as a resource in its own right, by comparing the Penn Treebank-style annotation scheme used for QuestionBank (Judge et al., 2006) with the modified NP annotation for the Penn Treebank introduced by Vadas and Curran (2007). We argue that this modified annotation scheme provides a better interface representation for semantic interpretation and show how it can be incorporated into the NLP-QT resource, without significant loss in parser performance. The parsing experiments reported in the paper confirm the feasibility of an iterative, semi-automatic construction of the NLP-QT resource similar to the approach taken for QuestionBank. At the same time, we propose to improve the iterative refinement technique used for QuestionBank by adopting Hwa (2001)’s heuristics for selecting additional material to be handcorrected and added to the data set at each iteration.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائۀ راهکاری قاعده‌مند جهت تبدیل خودکار درخت تجزیۀ نحوی وابستگی به درخت تجزیۀ نحوی ساخت‌سازه‌ای برای زبان فارسی

In this paper, an automatic method in converting a dependency parse tree into an equivalent phrase structure one, is introduced for the Persian language. In first step, a rule-based algorithm was designed. Then, Persian specific dependency-to-phrase structure conversion rules merged to the algorithm. Subsequently, the Persian dependency treebank with about 30,000 sentences was used as an input ...

متن کامل

تصحیح خودکار خطا در درخت بانک نحوی با استفاده از یادگیری ماشینی انتقال محور

The Treebank is one of the most useful resources for supervised or semi-supervised learning in many NLP tasks such as speech recognition, spoken language systems, parsing and machine translation. Treebank can be developded in different ways that could be, generally, categorized in manually and statistical approaches. While the resulted Treebank in each of these methods has the annotation error,...

متن کامل

Semi-Automatic Construction of a Question Treebank

Abstract A method for the semi-automatic construction of a question treebank is presented. We exploit linguistic knowledge like grammatical functions, constituent structure and the relatively strict word order of English encoded in the Penn Treebank to generate semi-automatically questions. The outcome is a treebank of questions which might be useful for developing better tagging and parsing mo...

متن کامل

Wide-Coverage Grammar Extraction from Thai Treebank

Parsing is an important step for natural language understanding, including phrase alignment for supporting statistical machine translation. Ability on analysing real text by parser strongly depends on grammar. Treebank could be one of the sources for grammar extraction. However, treebank construction largely relies on human annotators intuitions. Different intuitions from multiple annotators br...

متن کامل

Iterative Treebank Refinement

Treebanks are a valuable resource for the training of parsers that perform automatic annotation of unseen data. It has been shown that changes in the representation of linguistic annotation have an impact on the performance of a certain annotation task. We focus on the task of Topological Field Parsing for German using Probabilistic Context-Free Grammars in the present research. We investigate ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011